60 research outputs found

    DeepNano: Deep Recurrent Neural Networks for Base Calling in MinION Nanopore Reads

    Full text link
    Motivation: The MinION device by Oxford Nanopore is the first portable sequencing device. MinION is able to produce very long reads (reads over 100~kBp were reported), however it suffers from high sequencing error rate. In this paper, we show that the error rate can be reduced by improving the base calling process. Results: We present the first open-source DNA base caller for the MinION sequencing platform by Oxford Nanopore. By employing carefully crafted recurrent neural networks, our tool improves the base calling accuracy compared to the default base caller supplied by the manufacturer. This advance may further enhance applicability of MinION for genome sequencing and various clinical applications. Availability: DeepNano can be downloaded at http://compbio.fmph.uniba.sk/deepnano/. Contact: [email protected]

    Probabilistic approaches to alignment with tandem repeats

    Full text link

    A Possible Role for Short Introns in the Acquisition of Stroma-Targeting Peptides in the Flagellate Euglena gracilis

    Get PDF
    The chloroplasts of Euglena gracilis bounded by three membranes arose via secondary endosymbiosis of a green alga in a heterotrophic euglenozoan host. Many genes were transferred from symbiont to the host nucleus. A subset of Euglena nuclear genes of predominately symbiont, but also host, or other origin have obtained complex presequences required for chloroplast targeting. This study has revealed the presence of short introns (41–93 bp) either in the second half of presequence-encoding regions or shortly downstream of them in nine nucleus-encoded E. gracilis genes for chloroplast proteins (Eno29, GapA, PetA, PetF, PetJ, PsaF, PsbM, PsbO, and PsbW). In addition, the E. gracilis Pbgd gene contains two introns in the second half of presequence-encoding region and one at the border of presequence-mature peptide-encoding region. Ten of 12 introns present within presequence-encoding regions or shortly downstream of them identified in this study have typical eukaryotic GT/AG borders, are T-rich, 45–50 bp long, and pairwise sequence identities range from 27 to 61%. Thus single recombination events might have been mediated via these cis-spliced introns. A double crossing over between these cis-spliced introns and trans-spliced introns present in 5′-UTRs of Euglena nuclear genes is also likely to have occurred. Thus introns and exon-shuffling could have had an important role in the acquisition of chloroplast targeting signals in E. gracilis. The results are consistent with a late origin of photosynthetic euglenids

    Finding genes in Schistosoma japonicum: annotating novel genomes with help of extrinsic evidence

    Get PDF
    We have developed a novel method for estimating the parameters of hidden Markov models for gene finding in newly sequenced species. Our approach does not rely on curated training data sets, but instead uses extrinsic evidence (including paired-end ditags that have not been used in gene finding previously) and iterative training. This new method is particularly suitable for annotation of species with large evolutionary distance to the closest annotated species. We have used our approach to produce an initial annotation of more than 16 000 genes in the newly sequenced Schistosoma japonicum draft genome. We established the high quality of our predictions by comparison to full-length cDNAs (withdrawn from the extrinsic evidence) and to CEGMA core genes. We also evaluated the effectiveness of the new training procedure on Caenorhabditis elegans genome. ExonHunter and the newest parametric files for S. japonicum genome are available for download at www.bioinformatics.uwaterloo.ca/downloads/exonhunte
    corecore